In this paper, we present hierarchical relationbased latent Dirichletallocation (hrLDA), a data-driven hierarchical topic model for extractingterminological ontologies from a large number of heterogeneous documents. Incontrast to traditional topic models, hrLDA relies on noun phrases instead ofunigrams, considers syntax and document structures, and enriches topichierarchies with topic relations. Through a series of experiments, wedemonstrate the superiority of hrLDA over existing topic models, especially forbuilding hierarchies. Furthermore, we illustrate the robustness of hrLDA in thesettings of noisy data sets, which are likely to occur in many practicalscenarios. Our ontology evaluation results show that ontologies extracted fromhrLDA are very competitive with the ontologies created by domain experts.
展开▼